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Description 



The present invention reiates to a processor such as a semiconductor device for executing operations of addition 
and multiplication, its operation method used in the device, and a data processor in which the processor is used. 

In recent years, operation speeds of adders or multipliers are significantly increased with remarkable developments 
of semiconductor manufacturing technology due to micronization and semiconductor circuit technology including al- 
gorithms. Their arithmetic processing is used for every kind of semiconductor devices including a field of a central 
processing unit (CPU) or a digital signal processor (DSP). More the technologies develop, however, this arithmetic 
processing is required to have a higher performance, that is. a higher speed. 
w Particularly in fields requiring image processing in a multi-media age and a tremendous amount of calculations 

such as matrix operations, high-speed processing is required: in particular, the processing of an adder or a multiplier 
is one of the most important processing for determining its performance and it is required to be performed in higher 
speed. 

As an example of an adder in the present operation method, an explanation is given below for an adder described 
is in "Design of CMOS VLSI" (Supervised by Takuo Sugano. Baifukan). 

For an addition of two binary numbers, assuming that X and Y indicate the binary numbers. S indicates a sum of 
X and Y. and C indicates a carry, there are the following four types of calculations if X and Y each have a single place 

When X=0 and Y=0. S=0 and C=0. 
20 When X=0 and Y=1. S=1 and C=0. 

When X=1 and Y=0. S=1 and C=0. 
When X= 1 and Y= l . S=0 and C= 1 . 

If the sum S and the carry C are expressed by logical expressions considering the above as a truth table, expres- 
25 sions S=XeY and C=X Y are obtained. They can be achieved in a two-input two-output circuit based on a single 
exclusive OR and a single AND as shown in Fig. 41 A. A circuit having this function is called a half adder. 

If the binary numbers each have multiple places, in other words, if they have a bit width of two or more bits each, 
it is required to perform processing of a carry signal from a lower place. Accordingly the processing needs a circuit in 
which three binary numbers. X;. Y j: and Cj., can be added for a place. This three-input two-output circuit is called a full 
30 adder. Fig. 41 C shows a truth table and a logical expression representing its operation. A circuit for performing an 
addition of any number of places can be obtained by arranging the required number of full adders and connecting them 
so that a carry signal of a lower adder is entered into an upper adders. It is called a ripple carry adder. An example 
formed as a four-bit adder is illustrated in Fig. 41 B. Although there are a variety of circuitry for a single-bit full adder 
which correctly reflect the action of the truth table in Fig. 41 C. there is a point for designing with a purpose of a high- 
35 speed operation not in creating a sum signal, but in transmitting a carry signal entered from a lower place to an upper 
place as speedily as possible. Fig. 41 D shows an example of full adders designed from this viewpoint. 

If the number of the places is increased to. for example. 16 bits, there is a limitation on speed-up achieved by an 
improvement in an individual full adder and therefore, the speed-up must be achieved by the entire 1 6-bit adder. Since 
the operation speed of the adder is regulated by a carry transmission speed as mentioned above, the speed-up can 
40 b e achieved if a carry signal of the adder itself can be determined without awaiting a carry signal from a lower adder. 

A carry signal for all places can be created only from an input value of the own place and a carry signal of the 
lowest place. It is called carry look ahead (CLA). An example of a circuit to which this method is applied (CLA circuit) 
is shown in Fig. 42A. In Fig. 42A. HA indicates a half adder in Fig. 42B. and a part enclosed by a dotted line is achieved 
by a CMOS circuit in Fig. 42C. 

45 At the actual implementation in the circuit, mostly the carry signal for all places is not created in CLA taking into 

consideration the hardware amount or an efficiency, but the carry signal is transmitted by using CLA for each block 
such as. for example, a block consisting of 4 bits and the carry signal is transmitted by using a ripple within each block 
(block CLA). An example of a 16-bit adder in this method is illustrated in Fig. 43. 

A subtraction is achieved by adding a 2's complement of a subtracter to a minuend by using the adder. 

so Also in the above methods, however it is not easy to achieve further speedy operations against an increase of 

operands since both the number of elements and the operation time are increased significantly with an increase of the 
number of the operands 

For example, parallel six stages of add operations can be performed as shown in Fig. 44 for a speedy operation 
when 63 pieces of data are totally added, but 62 full adders are required. On the other hand, the operation can be 
ss performed only by a single full adder as shown in Fig. 45 if the number of the elements are decreased: but addition 
must be performed 62 times sequentially. 

Then : a parallel multiplier is briefly described below as an example of a multiplier in the present operation method. 

In a multiplication of (n x n) bits, a partial product is obtained as follows: 



2 
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(J=0 ' 'where lie partial product is a result of multiplying the following multiplicand by a single bit 2,^=0. 1 n-1) 

of a multiplicator Y: 

n-l 

X = £ 2'Xi 



Since .here are only Os and Is in b.nary numbers. P ;j is always 0 when Yj is 0. and each b,t of P, equals each , brt 
ol X when Y: is 1 . Accordingly, the partial product can be'obtained by taking AND between each b„ of the ™ P « 
and'a bit of the multiplicator By adjusting the places of the created partial products according to we.ghts of multiplicator 
bits and adding them each other, the following multiplication result can be obtained: 



n-l n-l 



EE 2^ 



j*Q i-o 



The most fundamental parallel multiplier can be achieved by arranging hardware (AND gate) for crea . gparlial prod- 
ucts m the above and a circuit for adding partial products in an array and connecting them A para lei mult p lie, of 8 
Ms x 8 bits is shown as an example in Fig. 46 As shown in this drawing, the parallel multiplier includes a full adder 
^01 a half adder 302. and an AND gate 303. . 

As shown in this example, in a multiplication for (n x n, bits, the partial products are easily and speed y calculated 
in n2 AND gates and an addition step for adding the partial products regulates the operation speed. Therefore increas- 
ing the speed of the addition step for the partial products is a key to the speed-up of a multiplier. - 

9 As improvement methods there are a carry save adder method in which ,t ,s possible o « 
transmitting a carry signal in the own stage by transmitting a carry Signal o. an addition stage for the partial products 
o an addefin the next addition stage, a Wallace-tree method (Wallace. C IEEE Trans, on Electronic Computers^ 

3 1 ?9 6 r P p 14-17) for performing an addition step in the same place in parallel, and a method ,n which a Booth 
Algorithm (Rubinifield. L. IEEE Trans, on Computers. C24. 10. 1975. pp 1014-1015) is used to decrease the number 
nf the created oartiai products, to increase the speed of the operation. 

nZZ'e^Ls^ however, both the number of the elemen.s and the operation „me are significant ly = ed 
with an increase of the number of the b.ts and it is no. easy to further increase the speed against the tendency ol 

inC Tcc n o^n e g,ra S 'rnu.,i pl ier to which a multivalued logic ,s applied ,s reported recently (T. Hanyu e, al. Proc IEEE Int. 
Symp. on MVL pp. 19-26. May (1994). Nov. 1993) It. however, does not come to be put to practical use ,n the present 

^ li'^an ob,ect of the present invention to provide a processor which performs high-speed operations with a few 
elements its method, and a data processor, solving the above technical problems. 

Z one aspect the present invention a.ms to reduce the number o. the required elemen.s and to lower the power 
consumotion in a processor and a data processor with increasing the operation speed. 

Z a separate object -of the invention to increase the operation speed by removing the transmission o. carry 

Si9 T,ts in s.H"a1e d pTa«e object o, the invention to reduce the amount o, data to be added by reorganization of da.a to 
increase the operation speed and to reduce the number of the elements required for operations. 

It is a separate object of the invention to increase the processing speed by executing operations ,n paralle 
According to one aspect, the present invention provides a processor for adding a plurality of mult.p e , brt data 
comSSjSt Addition means for adding data together on common places o. the plurality o. multiple bit da.a and 
second addition means for calculating a sum of the addition results obtained by the first addition means. 
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According ic another aspect ;ne present invention provides a processor for multiplying a plurality z- multiple tit 
data comonstnc a partial procuct creation means for creating partial products of respective single bus c; :ne plurality 
of multiple bit data, first addition means for adding data together on common places of a plurality of pan;al products 
created by the partial product creation means, and second addition means for calculating a sum of the addition results 
obtained by the first addition means. 

According to still another aspect, the present invention provides an operation method for adding a plurality of 
multiple bit data comprising a first addition step for adding respective places independently and a second addition step 
for calculating a sum of the addition results obtained by the first addition step. 

According to yet another aspect, the present invention provides an operation method for multiply a plurality of 
multiple bit data comprising a partial product creation step for creating a partial product of respective single bits of the 
plurality of multiple bit data, a first addition step for adding data together on common places of a plurality of partial 
products created by the partial product creation step, and a second addition step for calculating a sum of the addition 
results obtained by the first addition step. 

According to another aspect, the present invention provides input means for entering data, storing means for 
storing data, processing means for processing data stored in the storing means and data entered by the input means 
in a specified processing procedure and output means for outputting processing results of the processing means, 
wherein the processing means include a first addition means for adding data together on common places of a plurality 
of multiple bit data and a second addition means for calculating a sum of the addition results obtained by the first 
addition means and execute an addition of a plurality of multiple bit data. 

Other objectives and advantages besides those discussed above shall be apparent to those skilled in the art from 
the description of a preferred embodiment of the invention which follows. In the description, reference is made to 
accompanying drawings, which form a part thereof, and which illustrate an example of the invention. Such example, 
however, is not exhaustive of the various embodiments of the invention, and therefore reference is made to the claims 
which follow the description for determining the scope of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 
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Fig. 1 is a drawing illustrating an adder of the 1st embodiment: 

Fig. 2 is a drawing illustrating a majority logic circuit of the 1 st embodiment: 

Figs. 3A and 3B are drawings illustrating an ND: 

Fig. 4 is a drawing illustrating an adder of the 2nd embodiment: 

Fig. 5 is a drawing illustrating an adder for performing the second addition step of the 2nd embodiment: 
Fig. 6 is drawing illustrating an adder of the 3rd embodiment: 

Fig. 7 is a drawing illustrating a partial product creation circuit of the 3rd embodiment: 
Fig. 8 is a drawing illustrating a multiplier of the 3rd embodiment: 
Fig. 9 is a drawing illustrating an adder of the 3rd embodiment: 



Fig. 10 
Fig. 11 
Fig. 12 
Fig. 13 
Fig. 14 
Fig. 15 
Fig. 16 
Fig. 17 
Fig. 18 
Fig. 19 
Fig. 20 
Fig. 21 
Fig. 22 
Fig. 23 
Fig. 24 
Fig. 25 
Fig. 26 
Fig. 27 
Fig. 28 
Fig. 29 
Fig. 30 
Fig. 31 



is a drawing illustrating the number of full adder passage stages of the 3rd embodiment: 
is a flowchart illustrating a multiplication processing procedure of the 3rd embodiment: 
is a drawing illustrating a multiplication circuit of the 3rd embodiment: 
is a drawing illustrating an ND used in the 3rd embodiment: 
is a drawing illustrating a majority logic circuit used in the 3rd embodiment: 
is a drawing illustrating a majority logic circuit used in the 3rd embodiment: 
is a timing diagram of signals used in the 3rd embodiment: 
is a drawing illustrating an ND used in the 4th embodiment: 
is a drawing illustrating a majority logic circuit used in the 4th embodiment: 
is a timing diagram of signals used in the 4th embodiment: 
is a drawing illustrating a majority logic circuit used in the 5th embodiment: 
s a drawing illustrating a multiplier of the 5th embodiment: 
s a drawing illustrating a multiplier of the 6th embodiment; 
is a drawing illustrating a multiplier of the 7th embodiment: 
is a drawing illustrating a multiplier of the 8th embodiment: 
is a drawing illustrating an adder of the 9th embodiment: 
is a drawing illustrating an adder of the 1 0th embodiment: 
is a drawing illustrating an adder of the 11 th embodiment: 
is a drawing illustrating a 2-bit adder of the 1 2th embodiment: 
is a drawing illustrating an adder of the 1 2th embodiment: 
s a drawing illustrating an adder of the 1 3th embodiment: 
s a drawing illustrating a multiplier of the 14th embodiment: 
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Fig 32 is a drawing illustrating another multiplier of the Uth embodiment. 

Fig 33 is a drawing illustrating an adder of the 1 7th embodiment: 

Fig 34 is a drawing illustrating another adder of the 1 7th embodiment: 

Fig 35 is a drawing illustrating an adder of the 18th embodiment: 

Fig. 36 is a drawing illustrating a multiplier of the 20th embodiment: 

Fig. 37 is a drawing illustrating a DSP of the 21 st embodiment: 

Fig 36 is an operation timing diagram of the 21 th embodiment: 

Fig 39 is a drawing illustrating a reception circuit of the 22nd embodiment: 

Rg 40 is a drawing illustrating a card-typed transmiss.on/recept.on unit of the 22nd embodiment* ^ 
Figs 41 A to 41 D are drawings for explanation of -a conventional adder. 

Fias 42A to 42C are drawings for explanation of a conventional CLA circuit: annluad - 
Fig 43 is a drawing illustrating an example of a configurat.on of a conventional adder to which block CLA » applied 
Fin 44 is a drawing illustrating an example of a configuration of a conventional adder: 
Fiq 45 is a drawing illustrating an example of a configuration of a conventional adder: and 
Fig! 46 is a drawing illustrating an example of a configuration of a conventional multiplier. 

DETAILED DESCRIPTION OF THE PREFERRE D EMBODIMENTS 

This mvention will be described in detail below by using the accompanying drawings. • 
[1st embodiment] 

According to this embodiment, an addition method of a plurality of multiple bit data w.ll be descnbed by us,ng an 
ooeration of addinq seven 8-bit data sequences as an example. 

Reirnng to 1. this embodiment is shown. To add seven 8-bit data sequences in th.s embodiment a firs 
add-on step ^performed by adding seven 8-bit data sequences on respective places together f.rst. The s.ruc.u e of 
fhis addit'o^ i descnbed later in detail, and the calculation is made by using block 11 hav,ng a function o outpu.tmg 

^nTdi I^«!nfl an n ,n P u, ,n a b.nary mode. („ , indicated by S,pq, in Fig. 1 .„p: We,gh, o, places, q 

We tl^er S b?ock 11 having this function is called Number Detector and abbreviated .0 NO. In Fig. 1. an NDll 
block Represented by a box The number in each box mdica.es the number of the inputs (In) and the number of he 
tpu Ou, in the irt and right sides of a slash ,/). respeCvely. The number 0. the outputs is ^<™* * the 
numoer of the inputs and expressed by Out=[Log a (ln)], where [a] is a minimum integer Z which ,s greater than a. 

A ording ,0 this embodiment, an operation speed is determined by a speed of an NO which ,s the slow «■ r aH 
NDs'o p ocessing the first addition step in parallel S.nce the operation speed ,s a^ays the , same here « ,sde erm,ned 
bv an operation speed of one ND. In this embodiment, a data sequence has 8 b,ts and e.gh. NDs are used. The 
maximum even inputs are made .0 the NDs here due .0 an operation of adding seven 8-bit data sequences 

sTe S carry occurs in a general operation of an addition, the operation speed ,s lowered by th. carry P"W«™- 
This embodiment, however, has a feature 0. adding data together by performing an addition w.thou a ca , pa a e 
so mat it is possible to increase the operation speed. Although th.s embodiment shows an example of adding . seven 
8 b^dala sequences only, it is to be understood that the invention is no. limited .0 this example and various numbers 
of the bits are allowed in the plurality of multiple bit data. 

After that, a desired addition result Q can be obtained in a high speed by executing a second addition step for 
addina all of eight addition results represented in a binary mode. , 

Th e above N D is described below. First, a circuit diagram of a five-input (A. B. C. D, and E) majonty log* circuit 
is shown in Fig 2 The five-input majority logic circuit has a logic in which High is output when three or more inputs 
are High in five inputs. In the Boolean Algebra notation, i, is represented by A(B + C)(D + E) : C(B + E <A*D) + E(A*B) C+DV 
and i. can be easily formed by a CMOS circui. made of AND21 and OR22. Aithough five'inpu.s are used here. is 
apparent that it can be expanded to a general n inputs. n i lira ijt« of fj ve . 

Referring to Fig. 3A. it shows a circuit for determining whether n bits are true ,n m brts by- using 1 , plural ty o f.ve 
input majority logic circuits 31, where output F j( X, X 7 ) indicates that High is output when the number of the inputs 

' " Tig ffliSJTcS which serves as an ND by connecting an output 0. an array 32 correspor**ng ^ me circuit 
in Fig 9 3A with a circui. 33 for changing the output data ,0 binary codes 0. 3-bit binary numbers^ 
an output example is shown assuming that 5 bits are true in 7 bits. Although an N ^ ^^ning a CMOS « ,rcu.« is 
ex^ined here as an ND example, .he inven.ion is no. limi.ed to it and a circuit having the above ND function can be 

used. 
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[2nc err.DOC imeni] 

In this embodiment, there is provided an example of increasing the speed of the second addition step to .ncrease 
the operation speed of the addition in the first embodiment. 

Referring to Fig. 4. it shows a configuration of an adder of this embodiment. As shown in this drawing, the results 
of the addition with no common places can be integrated into one 1 0-bit data sequence out of 3-bit output data obta.neo 
by the ND m the first embodiment. It will be described below by using the example in Fig. 4. 

In this drawing the 3-bit output data on places enclosed by an elliptic frame can be integrated into 10-bit data 
sequence A since they have no common places (There are places which do not have a value when three addition 
results are integrated, and they are set to 0. In this example, the first place is set to 0 ). It is .mportant that no operation 
is performed in this processing with only wiring as processing in the circuit though it is treated as a step on its algorithm. 

In this step eight results of the addition can be convened to three 10-bit data sequences. A delay time is so short 
that it can be ignored in comparison with other steps. In the end. by adding three 1 0-bit data sequences, a final operation 
result can be obtained. Since three 1 0-bit data sequences are used in the example of Fig. 4. a final result of the addition 
can be obtained by a passage of full adders in only two stages as shown in Fig. 5 and a plurality of multiple bit data 
can be added at a high speed. 

Next this embodiment is described below for a general operation in which max. n-bit data sequences are added 
by m pieces. The result of addition output from n NDs has max. [Log^m] bits, therefore, it can be converted to max 
[Log 2 m] (n + [Log 2 m]) bit data sequences. In the end. by adding [Log 2 m] (n + [Log 2 m]) bit data sequences, a final oper- 
ation result can be obtained. The number of full adder passage stages can be expressed by L Log 2 [Log 2 m] . jwhere 
Laj is a minimum integer Z which is greater than or equal to a. From the above expression, the number of full adder 
passage stages can be kept to be low even if the number of the multiple bit data sequences becomes higher. 

[3rd embodiment) 

In this embodiment, multiplication of multiple bit data will be explained. Although it is described by giving an example 
of an 6x8 bit multiplier below, it can be expanded to a general mxn bit multiplication. 

Assume that X x Y = Q where X(X 7 X 6 X 5 X 4 X 3 X 2 X, X 0 ) is a multiplicand and Y(Y 7 Y 6 Y 5 Y 4 Y 3 Y 2 Y t Y 0 ) is a 
muttiplicator Since the maximum value is 2*- Tin a decimal number for both X and Y expression Q<(2Q-1 )2<2^-i ,s 
obtained and Q is expressed with maximum 16 bits. For mxn bits, expression Q<(2™-1)(2<M )<2™-1 is obta.ned and 
Q is expressed with maximum m+n bits. 

As shown in Fig. 6. a partial product X x Yj is created first. Although a partial product can be calculated by taking 
AND between each bit X; of a multiplicand X and a multiplicator Yj like a general CMOS multiplier, just an nMOS 
transistor having a common' gate electrodes is used for a simple explanation as shown in Fig. 7 in this embodiment. 
In addition, although an nMOS trans.stor is used as an example, it is apparent that other transmission gate MOS 
transistors can be used. 

Preferably. Xj should be set to Low(0) and Yj to High(l ) in an initial state and all of the outputs should be set to 0. 
Then X, is entered after setting Yj to a Low state. At an operation. High(l) or Low(0) is entered in Yj in this state. In 
other words, when Yj is set to high, a high signal is entered in a gate electrode and the nMOS transistor is turned on 
to generate the following 8-bit data sequence: 

X x Yj = (X 7 Y,, X 6 Y,, X 5 Y,, X 4 Yj, X 3 Y j , X 2 Y,, X t Y, , X^ ) 
= X(X 7 , X 6 , X 5 , X 4 , X 3 , X 2 , X lr X 0 ). 



When Y is set to Low(0). a Low signal is entered in the gate electrode, therefore, the nMOS transistor is turned off 
and an 8-bit data sequence (0, 0. 0, 0. 0, 0, 0 : 0) is created in the initial state. This makes it possible to form an AND 
circuit for XxY in a size smaller than a general AND circuit. A general AND circuit, however can be used for it. 

Then data on respective places of partial products in Fig. 6 is added together by an ND for each place. Since the 
addition is performed in parallel in this step, it is suitable for a high speed operation, (m+n-1 ) NDs are used in an mxn- 
bit multiplier circuit. The maximum number of inputs entered to NDs is Min(m.n). As shown in Fig. 6. 15 units of the 
NDs are used in the example of the 8x8-bit multiplier. Maximum eight inputs are made (Operation 
X 7 Yo + X 6 Y 1+ X 5 Y 2+ X 4 Y3 + X 3 Y 4+ X 2 Y 5+ X 1 Y 6+ XoY 7 is performed). 

The number of the NDs is applied when NDs are used for locations where NDs can be replaced by wiring in one- 
input one-output arrangement. If the replaceable NDs are omitted, (m+n-3) NDs are used. Further if NDs are used 
only for three or greater inputs excluding the locations where NDs can be replaced by two-input two-output HAs <al- 
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;houah an HA is a kind of an ND. « is Distinguished Irom an ND). (m+n-5) NDs can be used. 

Generally for three or greater inputs, an additive operation becomes complicated and the operation speec is 
towered due to propagation of carnes which may occur particularly in this state Since this embodiment has a feature 
of performing an operation w,.hou. carries by adding data together, a h.gh speed operation can be performed F,g. = 
5 ,s a block diagram of a drawing in Fig. 6. For simplification, a partial product formation unit is ommed as an input unit. 
The panel product formation unit can be replaced by an AND circuit. In th« drawing, an ND ,s represented by a box. 

By performing a second addition step for adding all of (m+n-1 ) addition results .ndicated in a binary mode after 
that desired multiplication result Q is obtained at a high speed. • 

Further to reduce the number of the additive operations, the next operation method is introduced in the same 
w manner as for the second embodiment. 

in other words, the addition results output from (m + n-1 ) NDs have max.mum lLog 2 (Min ( m.n))j bits^ therefore, only 
a pan of places are used in the (m + n) bits of the final multiplication result Q each. In the example of F,g. 6 an output 
from an MD has maximum lour bits and the final multiplication result has 16 bits. Accordingly, addition results with no 
common places can be integrated into one (m + n)-bu data sequence out of the add.t.on results output from |m + n-i NDs. 
■ The above operation is explained by using the example in Fig. 6. Since the addition results output from NDs on 
the places enclosed by ellipses do not have common places, they can be integrated into a 1 6-bit data sequence. B (If 
there is no value in a place when four addition results from NDs. zero is set to it. In this example, zero is set to places 
2 to 4. £. 12. 1 5. and 1 6.). It is important that no operation is performed in this processing with only wiring as processing 
in the circuit though it is treated as a step on its algorithm. 

In this step, (m+n-1) results of the addition can be converted to (Log^Mmtm.n))] (m + n)-b,t data sequences A 
delay time is so short that it can be ignored in comparison with other steps. In the end. by adding |Log 2 (M.n(m.n))| 
(m+n)-bit data sequences, a final operation result can be obtained 

Since four 16 bit data sequences are used in the examples of Figs. 6 and = a final result of the product can be 
obtained by a passage of full adders in only two stages as shown in Fig 9. In general, the number of lull adder passage 
stages can be expressed by L log 2 (Log 2 (Min ( m.n))| by using the same symbol as for the second embodiment. 

Referring to Fig 10. there is shown a graph with an abscissa of Min(m.n) and an ordinate of the number of full 
adder passage stages. As shown in this graph, even ,. m and n are increased, the number of full adder Passage stages 
can be kept to be low since the number of full adder passage stages takes log twice. In other words, the high speed 
operation is kept even if the number of the bits is increased. 
30 A flowchart of the above operation method is shown in Fig. 11 . 

In step Sill first, a partial product XxY, is created by an AND circuit or a switch. Next. ,n step Sii2. data on 
respective places of the created partial products Xx Yj is added together in parallel by NDs. Subs equentiy ^ step 
S113 bits with no common places are integrated into one data sequence out of the addition results in step S112. As 
mentioned above, however, there is no operation of a device corresponding to th,s s.ep ; and just a connection ,s made 
between outputs from NDs and inputs to full adders in a rear stage. Finally, in step S114. data integrated in step S113 

is added by full adders. • . . . „ 

Then an explanation is made for actual multiplication circuits for executing the above operation method such as 
the NDs used in this embodiment with referring to Fig 12. A multiplicand input unit 71 is used to enter a mi , tiplicand 
X A multiplicator input unit 72 is used to enter a mu.tiplica.or Y A partial product generator unit 73 ,s an AND circuit 
or a switch shown in Fig. 7 to generate partial products. As mentioned above, partial products can be ge nera ted by 
any of circuits having other configurations. An ND74 is used to add data on the same places of a plurality of multiple 
bit data (respective partial products in this embodiment) together in parallel. 

Referring to Fig. 13. there is shown a typical diagram illustrating a seven-input ND. NDs used in this embodimen 
have majority logic circuit block 131 -A. 131-B. 131-C. and an inverter 132. whose configuration ,s different from tha 
of the first embodiment. Signals entered into terminals 1 34 and 1 35 are the same as those entered into an input termnal 
1 33 Terminals 1 36 1 37. and 1 38 are used as destinations for entering output signals Irom a maionty logic circuit block 
in a front staqe In Fig 13. 2C and4C indicate capacitor values connected correspondingly to the input terminals 136. 
137. and 138 assuming that C indicates a capacity connected to a general input terminal. In this drawing, signals are 
entered to majority logic circuit blocks 131-A. 131-B. and 131-C. 

or Ixam ,e, Ihe'n a signal is entered to the seven-inpu. majority logic circuit block 1 31 -A. HIGH EVE is ou.pu 
from the block 1 31 -A if there are a majority of HIGH LEVEL data, or if four or more inputs are HIGH LEVEL in seven 
nput in the same manner, for example, if six or more inputs are HIGH LEVEL in an n-mpu. ma,or y log.c c, cu t 
S and if seven or more inputs are HIGH LEVEL in a 13-input majority logic circuit b.ock^ HIGH LEVEL . output. 
In Table 1 S3 indicates an output value of a seven-input majority logic circuit block lor each HIGH LEVEL input count. 

Subsequently, as shown in Fig. 13. an output of the seven-input majority logic circuit block 131-A .s inverted by 
an inverter and it is entered to the weighted input terminal 1 36 of the majority logic circuit block 1 31 1 -B A circuit con- 
figuration of the majority logic circuit block 131-B is shown in Fig. 14. In this drawing, a capacitor 212 has about lour 
times the capacitor value of a capacitor 202 connected to another input terminal path. Assuming that C is a capacitor 
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value connected to an mout term.nai oath, the circuit is an n -input majority logic circuit having a conf.gurat.on ,n which 
11 Cs are connected in common and a signal from the weighted input terminal is entered to four Cs cut of them anc 
the same signal as that entered to the 1 31 -A block is entered to other seven terminals. 

For example if four or more inputs are HIGH LEVEL in seven inputs. LOW LEVEL is entered to the we.ghted input 
term.nai as described in the above. If six or more inputs are HIGH LEVEL in seven inputs in signals entered to .nput 
terminals other than the weighted input terminal, the n - input majority logic circuit determines it as a majority in total 
and outputs HIGH LEVEL. If four or more and five or less inputs are HIGH LEVEL in seven inputs, it is not determined 
as a majority and LOW LEVEL is output If three or less inputs are HIGH LEVEL in seven inputs. HIGH LEVEL is 
entered into the weighted input terminal. If two or more and three or less inputs are HIGH LEVEL m seven inputs, there 
are four plus two or four plus three, that is. six or more inputs, therefore, it is determined to be a maiority and HIGH 
LEVEL is output. If one or less input is HIGH LEVEL, there are four plus zero or four plus one. that is. five or less inputs 
and LOW LEVEL is output. 

In Table 1 . S2 indicates an output value of the majority logic circuit block 1 31 -B for each HIGH LEVEL input count. 
As for the majority logic circuit block 1 31 -C. outputs as shown in Si in Table 1 can be, obtained by entering an inversion 
signal for outputs of the majority logic circuits 1 31 -A and 1 31 -B into two weighted terminals having a fourfold capacitor 
value and two-fold capacitor value before an operation. 

According to this circuit configuration, the number of the HIGH LEVEL inputs can be converted to a binary number 
in three places to be output out of two or more inputs as shown in Table 1 . 

A typical circuit diagram of a majority logic circuit block is shown in Fig. 1 5. This majority logic circuit block includes 
a reset switch 201. a capacitor 202. a signal transfer switch 203. a sense amplifier 205. an inverter 206 m the sense 
amplifier, a second inverter 204 in the sense amplifier a second reset switch 207 for resetting the inverter 206 a reset 
power supply 208. a second reset power supply 210. an output terminal 211 and a parasitic capacitor 209 at an end 
' of the capacitor 202 connected in common. It is typically illustrated in Fig. 1 5. though the invention is not limited to this 
configuration. 

Fig. 16 is an operation timing diagram of the circuit in Fig. 15. Referring to this drawing, the operation is described 
below. First, a terminal at an end of the capacitor 202 is reset by reset pulse oRES. For a reset voltage, for example, 
if a supply voltage is 5 V series. 2.5 V which is a half of the voltage is used. The reset voltage, however, is not limited 
to it. but any of other voltages can be used. Almost simultaneously, an input terminal of the inverter 206 in the sense 
amplifier is reset by making the reset switch 207 conduct. At this time, a value is selected as the reset voltage which 
is near a logical inversion voltage for inverting an output of the inverter. If the reset pulse ORES is set off. both terminals 
of the capacitor 202 are kept to respective reset potentials. 

Next, when the transfer switch 203 conducts due to a transfer pulse oT, a signal is transferred to a terminal of the 
capacitor 202 and a potential at the terminal of the capacitor is changed from, for example, the 2.5 V reset voltage to 
O V equivalent to a low level or 5 V equivalent to a high level. Assuming that C indicates a capacity of the capacitor 
202. C 0 indicates a capacitor value of the parasitic capacitor, and N units of the capacitors 202 are connected in parallel, 
terminals connected in common at an end of the capacitors 202 change for an input from a value near the logical 
inversion voltage of the inverter by ±[2.5C/(C 0 +CN)]xlVI due to a capacity division. 

When an input terminal voltage of the inverter 206 changes from the logical inversion voltage, an output terminal 
voltage of the inverter 206 is inverted according to it. When respective signals are entered for N inputs. N sums of the 
capacity division outputs are entered into the input terminal of the inverter 206. If a majority of HIGH LEVEL signals 
are entered out of N inputs, the input terminal of the inverter 206 shifts the logical inversion voltage to a higher potential 
and HIGH LEVEL is output to the output terminal 211 of the sense amplifier. If a majority of LOW LEVEL signals are 
entered, LOW LEVEL is output. By using the above configuration, the circuit in Fig. 1 5 serves as a majority logic circuit 
for outputting a logical value which is in the majority of two or more inputs. 

Although seven-input NDs are illustrated as an example in Fig. 13, it will be understood that they are not limited 
to this configuration, but it can be easily expanded to a multiple input configuration. In addition, latch circuits can be 
flexibly connected between the majority logic circuits for a pipeline processing to make a configuration for a further 
increase of the operation speed. 

The number of the majority logic circuit blocks required by the NDs is expressed by [Log^n]. where n is the number 
of the inputs to the NDs. As for the number of the inputs to the NDs. a value is applied from 1 to Min(m.n) in an mxn- 
bit multiplier, and apparently its operation time becomes the longest at an ND whose input count is Min(m.n) which is 
the maximum number of the inputs. It is because the number of the majority logic circuit stages is increased for the 
number of the inputs n with (Loc^n). It is apparent, however, that the number of the stages is not increased significantly 
when the number of the bits is increased since it increases with a log function. 

Since the operation is performed in parallel, it terminates with a plurality of NDs 74 at an operation speed of the 
NDs with the maximum number of the inputs Min(m.n). For NDs which terminate the operation earlier it is preferable 
to connect a latch circuit 79 to adjust the timing, though the measures are not limited to it. 

This ND configuration leads to a higher operation speed due to a parallel operation and to NDs with lower power 
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consumption due to less elements required, so that the characteristics of the operation methods m the above embod- 
iments-can be' significantly improved. 

There is a step lor integrating data output from a plurality of NDs into one data sequence on an algorithm, but no 
processing is pertormed m circuits as mentioned above, and therefore, there are no circuits corresponding to this step 
s in Fig. l2.Vull adders 75. 76. and 77 are 16 or smaller bit adders in an example of an 3x8 bit multiplication shown in 
Fig. 6. In Fig. 1 2. an 5x8 bit multiplier is used, therefore, three adders are required and they are arranged in two stages. 
Although general carry look ahead (CLA) typed full adders are used as the adders, it will be understood that this 
invention is not limited to this. 

In addition, latch circuits 79 are connected between adders to perlorm an operation with the first adders during an 
to operation with the adders in the second stage in the so-called pipeline method to achieve a higher operation speed, 
though the invention is not limited to this method. An operation result output unit 60 outputs an operation result in 16 
bits in this example of an SxS bit multiplication. 

The above multiplier configuration makes it possible to form a high speed multiplier due to a small number of 
elements, low power consumption, and a parallel operation. 

15 

[4th embodiment] 

Other configurations are described below in the ND section for parallel batch additions in the first to third embod- 
iments. 

20 Fig. 17 shows a typical drawing of NDs used for this embodiment, including a parallel operation circuit block 401 

and latch circuits 1 2. Fig. 1 8 shows a circuit diagram of the parallel operation circuit block 401 In this drawing, terminals 
501 . 502. and 503 are a first weighted input terminal, a second weighted input terminal, and a third weighted input 
terminal, respectively. Each terminal has a capacitor having almost one-fold, two-fold, or three-fold capacitor value 
connected to another input terminal path. An operation timing diagram of this embodiment is shown in Fig. 1 9. A parallel 

25 operation circuit block 401 is operated by pulse <t>RES or q>T. The latch circuits 1 2 are operated by pulse <t>PH. 

First, referring to Fig. 18. a basic operation is explained. An input signal is latched in a latch circuit 12-A. At this 
time. 0 V equivalent to a low level is applied to the weighted terminals 501 and 502 and 5 V equivalent to a high level 
is applied to the weighted terminal 503 with a pulse oSET Next, the voltages across the capacitor 202 are reset to 
respective reset voltages by a reset pulse 0RES. 

30 Then, when the transfer switch 203 conducts with a transfer pulse 0T : a signal is transferred to an end of the 

capacitor 202 and a potential at the end of the capacitor is changed to. for example, a low level or high level. A voltage 
of an end of the capacitor 202 connected in common changes according to the capacity division for an input. When 
an input terminal voltage of the inverter 206 changes from the logical inversion voltage, an output terminal voltage of 
the inverter 206 is inverted in response to it. When each signal is entered in response to N inputs, N sums of capacity 

35 division outputs are entered to the input terminals of the inverter 206. ; 

In this embodiment, signals of opposite polarity are entered into the weighted terminal 503 having three-fold ca- 
pacitor value and the weighted terminals 501 and 502 having one-fold and two-fold capacitor values, therefore, the 
amounts of voltage changes at an end of capacitors 2 connected in common are offset each other. The capacitors 2 
connected at respective input terminals other than the weighted input terminals have almost the same capacitor values. 

40 Accordingly, if a majority of high level signals are entered out "of N inputs, the input terminals of the inverter 206 shift 
the logical inversion voltage to higher potentials and HIGH LEVEL is output to the output terminal 211 of the sense 
amplifier. If a majority of LOW LEVEL signals are entered. LOW LEVEL is output. 

By using the above configuration/the circuit in Fig. 18 can serve as a majority logic circuit for outputting a logical 
value which is in the majority of two or more inputs. Fig. 17 shows a seven-input ND as an example. 

45 m this drawing, signals are entered into the majority logic circuit block 401 . Assuming that C is a capacitor value 

for connection with an input terminal path, the majority logic circuit block 401 can be considered as a 1 3-input majority 
logic circuit having a configuration in which high level signals are entered from the weighted input terminal to three Cs 
out of 13Cs connected in common, low level signals are entered from the weighted input terminal to other three Cs : 
and signals from 402 are entered into other seven terminals. 

so Accordingly, when an input value is entered, if the number of the high level signals are in majority, in other words, 

if there are four or more HIGH LEVEL inputs in seven inputs. HIGH LEVEL is output from the majority logic circuit 
block. In Table 1. S3 indicates an output value of a 1 3-input majority logic circuit block for each HIGH LEVEL input 
count. Subsequently, an output signal is latched in the latch circuit 12 with a pulse <>LAT1 and <1>LAT2. For example, if 
four or more inputs are HIGH LEVEL in seven inputs. HIGH LEVEL is entered into the weighted input terminal 501 and 

55 LOW LEVEL is entered into the weighted input terminals 502 and 503. 

If six or more inputs are HIGH LEVEL in seven inputs in signals entered to input terminals other than the weighted 
input terminal, the 1 3-input majority logic circuit determines it as a majority in total and outputs HIGH LEVEL. If four 
or more and five or less inputs are HIGH LEVEL in seven inputs, it is not determined as a majority and LOW LEVEL 
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IS output, in me same manner, outputs in Table i can be oota.ned bycnangmg signals to be entered into a weighted 
terminal with switching a polarity of an output signal or the switch 403. 

According to this circuit configuration, the number of the high level input signals can be convened to a binary 
number m three places to be output out of two or more inputs as shown ,n Table 1 .n an extremely small size of a circuit 
with lower power consumotion. Other description ,s the same as lor the third embodiment, but this configuration makes 
it possible to reduce the number of the elements further, to reduce power consumption in a small size of a circuit, and 
to form a high-speed semiconductor device due to a parallel operation. 

[5th embodiment) 

Other configuration are described below in the ND section for parallel batch additions in the first to fourth embod- 



iments 



It is an operation method in which two or more NDs are integrated out of the 1 5 NDs in the third embodiment 
Its example is shown in Fig. 20 on the basis of a 2x2 majority logic circuit applied to a 2x2 multiplier. In other 
words it is a majority logic circuit in the first stage in Fig. 13 in the third embodiment. Data on the first place (XoVo a 
place of 2°) is entered into a unit capacitor C. Two pieces of data on the second place ( x,y 0 and x 0 y, in a place of 2 ) 
are entered into terminals each having two-fold capac.ty 2C. and therefore, one input is counted two. Further, data on 
the third place (x.y, in a place of 2 2 ) is weighted 22 and one input is counted four. 

Other description is the same as for the first to fourth embodiments and up to seven inputs are output in the b.nary 
mode with four inputs in each ND. By using this weighting method, the parallel batch addition function can be used 

^Fwexampie in the 8x8 bit multiplier in the third embodiment, the NDs for adding data on places whose weights 
are 0. 1 . 2. and 3 are integrated into an ND91 as shown in Fig. 2 1 . and the areas on places whose weights are (4 5. 
6) (7 a g) (io 11 12. 13. 14) are integrated into each area (ND92 to ND94 in Fig. 21 ). 

Although the number of the inputs and that of the outputs are indicated by numbers for NDs in Fig. 21 any NO 
can be used only if it can count up to 56 inputs. Respective NDs generate 6-bit outputs. Assuming that A. B. C. and 0 
are the 6-bit data sequences having places in an ascending order. A and C can be integrated into a data sequence P 
and B and D into a data sequence Q in a step for forming new data sequences which is a step S11 3 in the flowchart 
in Fig. 11 of the third embodiment. 

Accordingly only an additive operation P+Q should be performed. In other words, in this embodiment, an 8X8 bit 
multiplication is executed in two steps: a step in which a parallel bath addition is performed in NDs and a step in which 
a 16-bit addition is performed only once. 

Other description is the same as for the third and fourth embodiments, but this configuration of the multiphcation 
circuit makes it possible to reduce the number of the elements further, to reduce power consumption in a small size of 
35 a circuit and to form a higher speed multiplier due to e parallel operation and less addition stages. 

Although contiguous places are integrated in the example of general weighting in th.s embodiment, the invention 
is not limited to this method, and any of more efficient methods can be flexibly used such as discontinuous weighting 
(for example when data on the 2' place and on the 2* place are integrated to be entered) or dividing data on one place 
to enter it with different weights for different NDs (for example, dividing data on 2 s places into two classes and entering 
40 them into different NDs). 

[6th embodiment] 

Although there are NDs in one stage for adding partial products in parallel and there are full adders in the rear 
is staqe in the third to fifth embodiments. NDs can be connected further after the NDs. Referring to Fig. 22, it is explained 
by giving an example of a 32x32 bit multiplier. It is a drawing around a 32-input ND. In the 32-input ND. 7-bit outputs 
are made. As for the lower bits. 6-bit outputs are made up to 16-31 input NDs. and these outputs can be considered 
as partial products which are operation results from the NDs. 

Accordingly 3-bit outputs can be made by using NDs again. In this state, data can be classified to three 64-b.t 
so data A B and C as data sequences and the operation can be performed only in two full adder stages. This method .s 
effective for an operation with a great number of the bits particularly, though it depends on the performance of adders 
and NDs. In addition, there is no problem when this method is used in combination with weighted NDs as described 
in the fifth embodiment. 

ss [7th embodiment] 

A multiplier of this embodiment is illustrated in Fig. 23. In this embodiment, an addition is made for (S73. S72. S71 
S70)and (Si 02, S101 , S100) which are outputs from NDs first. Practically. S73 is just added to S102. S101 and S100 
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WI,h As S ^elulfoMh 1 s a s!e C . three i5-b,t data sequences A. B. and C are formed .n th.s embodiment, though four i6-b,t 
data sequences are formed .n the third embodtment. Accordingly, the elements are reduced further ,n comparison w,th 
the third embodiment. 

[5th embodiment] 

A method of multiplying three p.eces of data ,s described in the eighth embodiment. Although ,t can be explained 
bv using 2-b.t data as a simple example, the method can be achieved in the same manner when the number of the 
bis of each data is no. identical in multiple bit 'data, and the number of the data should not be limited to three pieces 
and it can be expanded to arbitrary multiple data. ,. vBvr , 

The values of a multiplication are assumed to be A(al aO). B(bi bO). and Cfdi cO). In a calculation (A x B C). 
a partial product la^c, is generated as shown in Fig 24 and a multiplication result Q can be obtained by calcu^ng 
fhe sums To form a partial product la^c, each AND should be taken in the same manner as for the above embod, 
ments Even for three or greater number'of data, the operation speed for the partial product is high and a partial product 
can be formed in a parallel processing. 

Subsequently a batch addition is performed for data on the same places in this partial product. Although data on 
the same places is added together each ,n the example of Fig. 24. it is apparent that weighting or other steps can be 
also used as described in the above embodiments In Fig. 24. the batch addition output results on the places enclosed 
by elliptic frames are integrated into one data sequence due to no common places, which results in three 7-b.t data 
sequences, and they are added together to obtain a multiplication result Q of three 3-bit data^ 

More particularly, data on lower two places ,20 and 21 places) need not be added in a 7-b.t data sequence and 
S00 and S10 are output results for them, respectively Therefore, in an addition step which is the second step three 
5-bit data sequences are added, which leads to a higher operation speed. 

As apparent from these explanation also in a multiplication of three or more data sequences, the operation method 
of this invention is effective, so that a high speed multiplication can be performed with a small number of elements 
which leads to lower power consumption 

[9th embodiment] 

In this embodiment, an addition method is described for a plurality of multiple bit data including at least one negative 
value by giving an example of adding 63 7-bit data sequences. 

The negative value is represented by a 2's complement. In other words, the most significant! o.t 0 ,n ^the 7 bits 
indicates a sign: a positive value if it is set to 0 and a negative value if it is set to 1 A data sequence X=(X 6 X 5 X 4 X 3 X 2 X 1 X 0 ) 



s expressed by the following: 



40 



45 



so 



n-0 



Fiq 25 illustrates this embodiment In this embodiment, a batch addition is made for data on respect.ve places of 
seven 8-bit data sequences to them, first. The first addition is performed by using NDs in the same manner as for the 
above embodiments. Since a data sequence is in 7 bits including a flag in this embodiment, seven NDs are used. 

Processing in this addition step is performed in parallel, therefore, the operation speed ,s determined by a speed 
of a smgle ND Since the operation speeds of all NDs are identical, it is determined by the operation speed of a sing e 
ND. in addition, 63 7-bit data sequences are added in this embodiment, therefore, the maximum number of the inputs 

'° m sin?e 8 a S ca^ occurs in a general addition, the operation speed is decreased due to propagation of the carry. In 
this embodiment, however, a batch addition without carries is performed including a flag in parallel, so that the . addfton 
stages can be reduced and a h,gh speed operation is achieved. Although the example ,s shown only with 63 7-b t data 
sequences in this embodiment, the invention is no. limited to it. and there can be various numbers of the bits in a 
plurality of multiple bit data including at least one negative value. 

Subsequently, a desired addition result Q is obtained at a high speed by performing a second addition step for 
adding all of the eight addition results represented in a binary mode. 

Then flag bits indicating positive or negative signs are described below. . 

If a flag bit is set to 1 , it indicates (-1 )-2* in the 2's complement notation. Accordingly if the number of Is ,s indicated 
by n in 63 inputs, -n-J* indicates the value The minimum value is obtained when n is equal to 63 and an expression 
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-5^2 6 r -t? 6 -i )-2 6 holds true m this state without exceeding 2' 2 m an aosoiute value When (SF 5 . 5F 4: SF 3 . 3F 2 . 5F. 
SF,,) is'assumed to oe a binary notation of n and c.BSF 6 . 3SF= - BSF 0 ) is to be as inversion by using this negative 
value as a 2's complement and the 1 3th bit as a flag, the following expressions are obtained: 

F = -il-2« = -2«£SF n 2 fl 

n=0 



f~ = 2 6 (2 6 +£ BSF n 2 n +D 



n-0 



Accordingly. F should be added since the following expression holds true: 

A - IFI = A + F (IFI + F = 0. the 1 3th bit is set to 0 since 1 plus 1 and the 14th bit is none). 

I n Fig. 25. flag bits of the operands are converted to binary notation SFs by NDs and BSFs are generated by taking 
their inversion. Furthermore. F can be generated by adding i for 2's complement notation of flag data indicating a 
negative value to the 7th bit and the 1 3th bit. 

After that, a desired addition result Q can be obtained at a high speed by performing a second addition step for 
adding all of the seven addition results represented in the binary mode. 

As mentioned above, flag bits can be treated in the same manner as for bits representing numerals, and 63 pieces 
of 7-bit data including at least one negative value are converted to seven p.eces of 6-bit data by being passed to the 
NDs. Values 1 s for the 7th bit and the 1 3th bit for 2's complement conversion for flags can be incorporated without any 
operation, for example, as shown by,(b) in Fig. 25. by adding each to an Son or BSF data sequence. 

In this method, the 1 3th bit can be used as a flag: 



OtJt = (-DC> 12 '2 12 +£ Q * 2 
n»0 



Assuming that X Si indicates the (j+1)th bit of the ith data sequence, the above calculation can be expressed nu- 
merically, as follows: 



out = £ ((l-X) is -2 6 +£* in -2" 
i-l n-o 



=2 6 (-2 6 ^S5F n 2"*l) (£* in -2") 

n-Q 1-1 n-0 



Therefore, when out is greater than or equal to 0 : the following is obtained from the above expression, 
i-l n-o r-o 



12 



EP 0 741 354 A2 



75 



20 



25 



30 



40 



45 



In other woras. 1 is added to tne 1 3th bit looe set to 0 
When out is smaller than 0' 



£ £ X ln •2^2 6 £BSF n 2"-l<2 12 
i-l n-o 



Therefore the 13th bit remains to be 1 which is a negative value. 

in a qeneral description, the mth bit is a flag bit and an addition (subtraction) can be performed for n operands in 
use of NDs by setting the m + [log 2 nlth bit to 1 and adding 1 to the mth bit for 2's complement notation since the flag 
bits can be treated as bits equivalent to other numerical bits other than flag bits, which makes the operation easier 

(10th embodiment] 

This embodiment will be explained below by giving an example in which the operation speed of the second addition 
steo is increased to achieve higher operation speed of the addition in the 9th embodiment. 

Piq 26 shows a configuration of an adder of this embodiment. It shows an example of adding seven pieces of 8-bit 
data including flags in which addition results with no common places are integrated into one 11 -bit data sequence out 
of 3-bit output data obtained from NDs. This operation is explained by using the example in F.g. 26 As for flag bus^ 
thev are considered as 2's complements as described in the ninth embodiment and outputs from NDs.are passed 
through an inverter Further 3-b,t output data and data represented by 0001 are added by a full adder to add 1 though 
the invention, however, is not limited to ,t In addition. 1 of MSB indicated by (a) is present in the same manner as for 
the ninth embodiment. 

In this drawing 3-bit or 4-b.t output data on the places enclosed by elliptic frames can be integrated into 11 -bit 
data sequence A since they have no common places (There are places which do no. have a value when three add.tion 
results are integrated, and they are set to 0 In this example the first place is set to 0 ). It is important that no operation 
is performed in this processing with only wiring as processing in the circuit though it is treated as a step on its algorithm 
In this step eight results of the addition can be converted to three 11 -bit data sequences A delay time is so short 
that it can be ignored in comparison with other steps. In the end. by adding three 11 -bit data sequences a final operation 
result can be obtained Since three 11-bit data sequences are used in the example of Fig 26 a fina result o the 
addition can be obta.ned by a passage of full adders in only two stages as shown in Fig. 5 and a plurality of multiple 
35 bit data can be added in a high operation speed 

Next this embodiment is described below for a general operation in which max. n-bit data sequences are added 
by m pieces The result of addition output from n NDs has max [Loc^m] bits, therefore, it can be converted to max. 
rLocJnl (n + [log 2 m]-1) bit data sequences In the end by adding [Log 2 m] (n + [Log 2 m]-1) bit data sequences, a final 
operation result can be obtained. The number of full adder passage stages can be expressed by L log 2 [Log 2 m] Prom 
the above expression, the number of full adder passage stages can be kept to be low even ,f the number of the multiple 
bit data sequences becomes higher 

Flag bits are passed through small full adders, if they are arranged in a single stage . L log 2 ILog 2 m] ,*n is obtained. 
If the full adders are not connected in the locations, but a flag bit is indicated by t there are [Log 2 m] + i data sequences 
assuming that a data sequence in which only t bits are Is is added last and the number of full adder passage stages 
can be expressed by L log 2 (Log 2 m] + S\ . In any case even if the number of the bits is increased, the number of full adder 
passage stages can be kept to be low. 

[11th embodiment) 

so in this embodiment, an addition method tor a plurality of multiple bit data including at least one negative value is 

described below by giving an example of adding 63 7-bit data sequences. 

A negative value is represented by Vs complement in this embodiment The Vs complement has a merit that 
preprocessing can be easily simplified since only inversion should be made for a numeric bit 

Referring to Fig. 27. this embodiment is illustrated. In this drawing. - to add 63 7-bit data sequences, data on re- 
ss soective places are added together. This addition is performed by 63-input six-output NDs 

Processing in the addition step is performed in parallel in this embodiment, therefore the operation speed is de- 
termined by an operation speed of each NO. In this embodiment, seven NDs are used because of 7-bit data sequences 
in addition since 63 7-bit data sequences are added in this operation, the number of the inputs to the NDs is equal to 63 
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Since a carry occurs in a general addition, me operation speed is decreased due to propagation of the carry. In 
this embodiment however, a batch addition without carries is performed in parallel, so that the addition stages can ce 
reduced and a high speed operat.on is achieved Although the example is shown only with 63 7-bit data sequences 
for the addition in this embodiment, the invention .s not limited to it. and there can be various numbers of the bits in a 
plurality of multiple bit data including at least one negative value 

Subsequently, a desired addition result Q is obtained at a high speed by performing a second addition step for 
adding all of the eight addition results represented in the binary mode. 

Although description of flag bits are the same as for the ninth embodiment, an addition is required by the number 
of flags (the number of negative data sequences) to convert a Ts complement to a 2's complement It corresponds to 
data represented by (SF S . SF 4; SF,. SF 2 . SF,. SF 0 ). which is indicated by (a) m Fig. 27. It is also an output from NDs 
and its inversion is generated by an inverter to create data sequences BSFs. (b) indicates 1 tor I's complement notation 
of flag data indicating a minus sign. For its ND. an ND described in Fig. l3or Fig. 17 can be used 

[ 1 2th embodiment] 

In this embodiment, an operation is performed by integrating eight NDs in the 10th embodiment into a plurality of 
units. In Fig. 28. a majority logic circuit is described below by giving an example of a 2-bit adder of X (represented by 
X,X 0 ) + Y (represented by Y,Y 0 ). 

Data on the first place Xq and Y 0 is entered into a unit capacitor C. Two pieces of data on the second place are 
entered into terminals each having two-fold capacity 2C. therefore, one input is counted two. 

For example- in the 10th embodiment, as shown in Fig. 29. NDs for adding data on places whose weights are 0. 
1 and 2 can be integrated into an ND91 . and in the same manner, areas on places whose weights are 3. 4. and 5 be 
integrated into an ND92. Although the number of the inputs and that of the outputs are indicated by numbers for NDs 
in Fig. 29. any ND can be used only if it can count up to 49 pieces of maximum 21 -inputs Respective NDs generate 
6-bit outputs. In the ND91 . data on place 0 is entered into the unit capacitor C. data on the first place is entered into a 
1C and data on the second place is entered into a (2 2 =) 4C. 

In addition, a flag bit (the 8th bit) can be combined with a numeric bit (the 7th bit. MSB (most significant bit)). In 
Fig. 29. the flag bit is entered into an ND93 through an inverter. Although it is passed through the inverter after a 
passage through the ND for the inversion in the embodiment mentioned above, either of the orders can be applied. 
According to this embodiment, however, the total capacity of the NDs is assumed to be (2 n -i)C (n is an integer). 

Further. (00010) for a 2's complement is added to the flag bit to be entered as a weight to the capacitor 2C. The 
numeric bit is entered to the capacitor C. As shown in Fig. 29. each output is added to 1 on the nth bit for a 2's 
complement and two data sequences can be deleted at a time, therefore only two data sequences should be added 
The processing becomes more parallel by using weighting, which contributes to increasing the operation speed and 
reducing required elements and power consumption. 

Although contiguous places are integrated in the example given for description of general weighting in this em- 
bodiment, the invention is not limited to this method, and any of more efficient methods can be flexibly used such as 
discontinuous weighting (for example, when data on the 2° place and on the 2 2 place are integrated to be entered) or 
dividing data on one place to enter it with different weights for different NDs. 

[1 3th embodiment] 

In this embodiment, an addition method for a plurality of multiple bit data is described by giving an example of 
adding seven 8-bit data sequences. 

Referring to Fig. 30, this embodiment is illustrated. In this embodiment, a batch addition is made for data on re- 
spective places of seven 8-bit data sequences to add thenrv first. This addition is executed by NDs. 

In the same manner as for the second embodiment, the operation speed of this addition step is determined by an 
operation speed of a single ND. eight NDs are used, and the maximum number of the inputs to the NDs is equal to 
seven. In addition, it is also the same as for the above embodiments that an addition without carries is performed in 
parallel which leads to a higher operation speed and that there can be various numbers of the bits of a plurality of 
multiple bit data. 

A desired addition result Q is obtained at a high speed. by performing a second addition step for adding all of the 
eight addition results represented in the binary mode subsequently. 

Processing can be more efficient not by adding all data simply, but by integrating a plurality of data into single data 
in performing this addition step. Paying attention to the batch addition results MSBs in Fig. 30 from this viewpoint, an 
expression Si2(0<i<7) is obtained for each and there are no common data, therefore, an expression is obtained for the 
data sequences as follows: 
S72 S62 S52 S42 S32 S22 S1 2 S02 = 
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Respective data on the subsequent place and LSBs (leas, significant bit) can be also expressed by a s.ngle data 

seauence in the same manner. No operation is required lor a step in which these data sequences are generated 
' The aooie step will be explained in a general description below. A batch addition of 14 (1=1 u. ; - - 2) on the third 

place generates three pieces of data S30. S31 . and S32. and respective places are 3 + 0. 3 + i and 3+2. A batch add, on 
.o Sthe mth p ace (m<i ) generates data on the <m + n)th place ([Log 2 (IN)| > n ,0 (IN: Data count). 2 > n > 0 (IN = 7) . A 

batch action for he ml (m* m) place generates data on the <nVn>th place in the same manner. S,nce rrun.m .n. 

mere are no common places in the data. Accordingly, they can be integrated into single data without any operation. 

W th m ting he opera ion more efficient like this, data can be integrated into three data A. B. and C ,n the examp e 

Thown fn Fig 30. His the same as for the second embodiment that no operation ,s performed in ,h,s processing w,.h 
is nnlv wirinq as processing in the circuit though it is treated as a step on its algorithm. 

only w rmg as p ^ ^ ^ ^ ^ ^ ^ ^ ^ seqoences A de , ay „ me SQ snof1 

that it can be ignored in comparison with other steps. In the end. by adding three 8-bit data sequences, a final operation 
esu can be obtained. S.nce three S-bit data sequences are used in the example of Fig. 30. a f,nal result o the afcttpn 
can be obtained by a passage of full adders in only two stages as shown in Fig. 5 and a plurality of multiple b,t data 

20 ran be added at a hiqh speed. 

Next this embodiment is described below for a general operation in wh,ch maximum n-b,t data sequences are 
added by m pieces The result of addition output from n NDs has maximum [Loc^m] bits, therefore, it can be converted 
fo ILog 2 ml data sequences. In the end. by adding [Loc^m, data sequences, a final operation result can beob.amecl ■ 
The number of full adder passage stages can be expressed by L log 2 [Log 2 ml from the above expression^ number 
of full adder passage stages can be kept to be low even if the number of the multiple bit data sequences becomes 
higher. It is apparent that this efficient method can be applied to two or more pieces of data. 

[14th embodiment] 

In this embodiment, a multiplication between multiple bit data is described by giving an example in Fig. 31. The 
description is made below by giving an example of an (8x8)-bit multiplier and it can be expanded to a general (mxn)- 

b " ^umemat X x Y = Q where X(X 7 X 6 X 5 X 4 X 3 X 2 X, X 0 ) is a multiplicand and Y(Y 7 Y 6 Y s Y 4 Y 3 Y 2 Y, Y 0 ) is a 
multiplier. As described in Figs. 3A and 3B. Q is expressed with maximum 16 bits. Q has the maximum m + n b.ts 

35 Tshlwn ,n Fig 31. a partial product XxY is created first. Although partial product is calculated by taking AND 
between each bit Xj of a multiplicand X and a multiplica.or Yj like a general CMOS multiplier, any of other methods, for 
example the method described in the third embodiment can be also used. , . 

Subsequently, the sums of the partial products on respective places in Fig. 31 are added for each place by NDs 
a. a time Since processing is performed in parallel in this addition step, it is suitable for a h,gh speed operation. In an 
mxnTbi. mu.tipSca.ion circuit (m + n-l , units of NDs are used. The maximum number of the inputs to the NDs . equal 
!3(m.n). in an example of an <ex8)-bit multiplier shown in Fig. 31. 15 NDs are used. Maximum eight inputs are 
made (for operation X 7 Y 0 +X 6 Y 1+ X 5 Y 2+ X 4 Y3 + X 3 Y 4+ X 2 Y5 + X 1 Y 6 +X 0 Y 7 ). ■ „„ „ ™« 

The number of the NDsis applied when NDs are used for locations where NDs can be replaced by wiring in one- 
input one-output arrangement. If the replaceable NDs are omitted. (m + n-3) NDs are used Further. ,f NDs are used 
only for three or greater inputs excluding the locations where NDs can be replaced by two,n P u. two-output HAs (Al- 
though an HA is a kind of an ND. it is distinguished from an ND). (m + n-5) NDs can be used. 

Generally, for three or greater inputs, an additive operation becomes complicated and the operation speed .s 
lowered due to propagation of carries which may occur particularly in this state. Since this embodiment has a feature 
so 0 f performing an operation without carries by adding data together, a high speed operat-on is achieved. 

By Performing a second addition step for adding all of (m+n-1 ) addition results indicated in the binary mode after 
that desired multiplication result Q is obtained at a high speed. 

Further to reduce the number of the additions, data is rearranged according to the 1 3th embodiment. It is explained 
by using the example in Fig. 32. The sum of the partial products of the 8th place is equal to 4 (bits). Due to 3-brt outputs 
from other batch additions for an S73. there are no data satisfying the conditions. Accordingly it » considered as single 
data (A) As for an S72. the partial products from the 4,h to 1 2th places are 3-bit data and there is data w,th no common 
pfaces o the S72. 1, corresponds to a 9-bi, data (B) represented by Si2 (3 * i * 1 1 V In the same mjnn.rd.t- .can be 
integrated into one data sequence as Sii and SiO. and the data can be rearranged to four pieces of data (A. B C. D) 
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finally As mentioned above, no operation is performed sn this processing with only wiring as processing in me circuit 
thougn u is treated as a step on its algorithm. 

A delay time is so short that it can be ignored in comparison with other steps. In this step, (m+n-1 ) pieces of addition 
results can be convened to (Log 2 fMin(m.n))] pieces of data sequences. A final operation result can be obtained by 

s adding (Log 2 (Mm(m.n))) pieces of data sequences as the last processing. 

Since four pieces of data is used in the example of Fig. 32. a final product can be obtained by a passage of full 
adders in only two stages as shown in Fig. 9. In general, the number of full adder passage stages can be expressed 
by L log 2 [Log 2 {Min(m.n))] by in the same manner, as for the third embodiment. Therefore, even if m and n are increased, 
the number of full adder passage stages can be kept to be low according to the graph in Fig. 10. In other words, even 

io if multiple bits are increased, the high speed operation is achieved with less elements and less power consumption. 
In addition, for additions of the S73 and Si2. 4-bit (5-bit output including a carry) adders can be used since the addition 
is made for the 11th or higher places (S73+S112 S102 S92 SS2). This operation method can be applied to a multipli- 
cation circuit having the same configuration shown in Fig. 12 which is the same as for the third embodiment. 

According to this embodiment. NDs having the same configuration as for Fig. 13 can be used. With this circuit 

is configuration, the number of the HIGH LEVEL inputs can be converted to a binary number with three places to be 
output out of a plurality of inputs as shown in Table 1 in Fig. 1 3. In this circuit, the values converted to binary numbers 
are output from MSBs. and respective MSBs are output at almost the same timing. If these NDs are used. MSBs output 
at almost the same timing can be integrated into single data, which is more effective for data compression in the above 
operation. 

20 By using the above configuration of the multiplication circuit, a high speed multiplier can be formed due to parallel 

operations with less elements and lower power consumption. 

[1 5th embodiment) 

25 in the same manner as for the fourth embodiment, the configuration in Fig. 17 can be applied to the ND sections 

for performing the parallel batch additions described in the above 1 3th and 14th embodiment. It makes it possible to 
convert the number of the inputs of high level signals out of a plurality of inputs to a binary number with three places 
to be output in an extremely small circuit with low power consumption as shown in Table 1 in Fig. 17 In this circuit, 
values convened to binary numbers are output from MSBs. Although other description is the same as for the 1 3th and 

30 14th embodiments, this configuration makes it possible to form a high speed semiconductor device due to parallel 
operations with further less elements and lower power consumption because of the size reduced. 

[1 6th embodiment] 

35 in this operation method, a plurality of NDs are integrated out of the 1 5 NDs in the 1 3th embodiment in the same 

manner as for the fifth embodiment. The NDs are assumed to have a configuration in Fig. 21 

Although the number of the inputs and that of the outputs are indicated by numerals for NDs in Fig. 21. any ND 
can be used only if it can count up to 56 inputs. For wiring, up to 2 1 -inputs are applied. Respective NDs generate 6-bit 
outputs. Subsequently data is integrated with rearrangement, and Si 05. S85. S45 r and SOS corresponding to MSBs 

*o are treated according to the rules of the 1 3th embodiment. S104. S84. S44. and S43 on the subsequent place can be 
integrated into single data and it has no places common to the MSB data sequence, therefore, they can be also inte- 
grated into one data sequence. Accordingly, the data can be integrated into two data sequences finally. Other descrip- 
tion is the same as for the 1 3th and 14th embodiments, but this configuration of the multiplication circuit makes it 
possible to reduce the elements further, to reduce power consumption in a small-sized circuit, and to form a higher 

^5 speed multiplier due to parallel operations and less addition stages. 

Although contiguous places are integrated in the example of general weighting in this embodiment, the invention 
is not limited to this method, and any of more efficient methods can be flexibly used such as discontinuous weighting 
(for example, when data on the 2° place and on the 2 2 place are integrated to be entered) or dividing data on one place 
to enter it with different weights for different NDs (for example, dividing data on 2 8 places into two classes and entering 

50 them into different NDs). 

[17th embodiment] 

This embodiment is described below by giving an example in which 63 7-bit data sequences are added. Fig. 33 
55 is an explanation diagram for this embodiment. First, data on respective places are added together to add 63 7-bit data 
sequences. This addition is performed in the same circuit as for the 14th to 16th embodiments. Since 7-bit data se- 
quences are used, seven NDs are used in this embodiment. The operation speeds of respective NDs are identical. 
therefore : the entire operation speed is determined by an operation speed of a single ND. In addition. 63 7-bit data 
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sequences are added in this operation, and therefore, the number of the inputs to the NDs is equal to 63 The NDs 
generates S-bit output data from an MSB in orcer By performing an addition with no carries m parallel with NDs. a 
high speed operation is achieved. Although the example is shown only with 53 7-bit data sequences for the addition 
in this embodiment, the invention is not limited to it. and there can be various numbers of the bits m a plurality of multiple 

s bit data. * ^ . 

Subsequently, a desired addition result Q is obtained at a high speed by performing a second addition step for 

adding all of the eight addition results represented in the binary mode. 

This addition step is described below. As mentioned above, the NDs outputs data from an MSB in order, and MSBs 
from all NDs and data on the subsequent places are output at the same timing. As for a description of .an MSB. the 

w MSB from an ND for the mth place (m>l ) is output at the (m+5)th place. In other words, the MSBs from an ND for the 
mth place under the condition of 1<m<7 do not have any common places, and they can be integrated into single data 
without any operation (areas enclosed by frames in Fig. 33). In the same manner, an output on a place one lower than 
an MSB can be combined with data on the (m+4)th place, data on the next place be combined with data on the (m+3) 
th place and the subsequent data can be integrated like this in the output order sequentially. In Fig. 33. each MSB 

is can be represented by Si5(0<l<6) and the subsequent data be represented by Sik(0<.<6). therefore, six pieces of data 
under the condition of (0<k<5) are sequentially output. These data are added together Due to the above data rear- 
rangement. MSB data Si5 and data of the subsequent place Si4 are output, and then an operation of Si5+Si4 can be 
executed while each ND calculates Si3. In the same manner, after Si3 and Si2 are output, an operation of Si3 + Si2 can 
be performed while an ND calculates St1. Accordingly, a high speed operation can be ach.eved without awaiting all 

20 outputs of the results from the NDs by starting the second addition step. 

As an example of an extreme case, if an operation of Si5+St4 is executed during calculation of Si3 S.3 is added 
to a result of the (Si5+Si4) during calculation of Si2. and then an addition is further executed sequentially as shown m 
Fig 34 the entire operation speed can be increased and the elements can be reduced because only one adder is 
required There is an optimum value between an ND operation time and the second addition time, but the second 

25 addition can be performed in parallel with the ND addition, or a batch addition of data on common places, which leads 
to less elements, therefore lower power consumption as well as a higher operation speed. 

[18th embodiment] 

30 This embodiment is explained below by giving an example of adding a plurality of data sequences having various 

numbers of the bits. Fig. 35 is an explanation diagram of this embodiment. Eight n-bit (8<n<D data sequences are 
added here. First, data on each place is added together for eight n-bit data sequences. The first addition is performed 
by NDs. 

Maximum 8-bit data sequences are used in this embodiment, therefore, eight NDs are used (z7 indicates a number. 
35 and seven NDs are required. Y6+Z6 indicates two inputs and an HA can be used, but an ND is used.here). Since 
processing is performed in parallel in this addition step according to this embodiment, the operation speed is determined 
by the slowest operation speed of an ND. Maximum. eight inputs are entered into an ND since eight data sequences 
are added in this operation. The operation speed of respective NDs is not identical and it is regulated by an eight-input 
ND. 

40 Since a carry occurs in a general additive operation, the operation speed is decreased by the carry propagation. 

This embodiment, however, has a feature of adding data in parallel without carries, so that htgher operation speed is 
achieved. Although this embodiment shows an example of adding eight data sequences having various numbers of 
the bits from 1 to .8 bits different each other, it will be understood that the invention is not limited to this example. 
A desired addition result Q is obtained at a high speed by performing a second addition step for adding all of the 
45 eight addition results represented in the binary mode subsequently. 

Processing can be more efficient not by adding all data simply, but by integrating a plurality of data into single data 
in performing this addition step. Paying attention to the batch addition results in Fig. 35 from this viewpoint, for example, 
places of S70. S51 , and S50 do not overlap at all and they can be integrated into single data without any operation. It 
is the same as for the second embodiment that no operation is performed in this processing with only wiring as process- 
so ing in the circuit though it is treated as a step on its algorithm and that the delay time is so short that it can be ignored 
in comparison with other steps. With this efficiency method, two pieces of data can be integrated into single data in 
the example of Fig. 35. The operation, however, can be more efficient in the same manner only if there are two or more 
pieces of data. 

For further increasing the operation speed, the first and the second addition steps are executed m parallel. In Fig 
55 35 data on the 8th place or S70(Z7) is output earliest as data from an ND. data on the 7th place S61 and S60 are 
output, and the subsequent data is output in the order of the 6th. the 5th, - places. Accordingly, in the example of Fig. 
35 calculations for S70. S51. S50+S61. and S60 are executed, for example, without awaiting the completion of cal- 
culating data on the first place. After that, the subsequent output results S42. S41. S40. S12. S11 and S10 are added 
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(Practically S*2 and S*1 are added) The ND operation corresponding to the first addition step can oe executed tn 
parallel with the operation corresponding to the second addition step, which leads to a higher operation speed 

[1 9th embodiment) 

In this embodiment, a multiplication between multiple bit data is explained by giving an example in Fig. 31 The 
description is made below by giving an example of an (SxS)-bit multiplier, and it can be expanded to a general (mxn)- 
bit multiplication. 

Assume that X x Y = G where X(X 7 *s x * * 3 * 2 X i X o> is a multiplicand and Y(Y 7 Y 6 Y 5 Y 4 Y- Y 2 Y, Y 0 ) is a 
w multiplicator. As described in Figs. 3A and 3B. Q is expressed with maximum 16 bits. Q has the maximum m+n bits 
for mxn bits. 

As shown in Fig. 31 . a partial product X x Yj is created first. Although partial product is calculated by taking AND 
between each bit Xi of a multiplicand X and a multiplicator Yj like a general CMOS multiplier, any of other methods, 
for example, the method described in the third embodiment can be also used. 

is Subsequently, the sums of the partial products on respective places in Fig. 31 are added for each place by NDs 

at a time. Since processing is performed in parallel in this addition step, it is suitable for a high speed operation. In an 
(mxn)-bit multiplication circuit, (m+n-1 ) units of NDs are used. The maximum number of the inputs to the NDs is equal 
to Min(m.n). In an example of an (SxS)-bit multiplier shown in Fig. 31. 15 NDs are used. Maximum eight inputs are 
made (for operation X 7 Y 0 +X 6 Y 1 +X 5 Y 2 +X 4 Y3+X 3 Y 4 +X2Y 5 +Xi Y 6 +X 0 Y 7 ). 

20 The number of the NDs is applied when NDs are used for locations where NDs can be replaced by wiring in one- 

input one-output arrangement. If the replaceable NDs are omitted, (m+n-3) NDs are used. Further, if NDs are used 
only for three or greater inputs excluding the locations where NDs can be replaced by two-input two-output HAs (Al- 
though an HA is a kind of an ND. it is distinguished from an ND). (m+n-5) NDs can be used. 

Generally, for three or greater inputs, an additive operation becomes complicated and the operation speed is 

25 lowered due to propagation of carries which may occur particularly in this state. Since this embodiment has a feature 
of performing an operation without carries by adding data together, a high speed operation is achieved. 

By performing a second addition step for adding all of (m+n-1 ) addition results indicated in the binary mode after 
that, desired multiplication result Q is obtained at a high speed. 

Further to reduce the number of the additions, data is integrated in the same manner as for the 18th embodiment. 

30 which results in rearrangement of four pieces of data (A. B. C. D). As mentioned above, no operation is performed in 
this processing with only wiring as processing in the circuit though it is treated as a step on its algorithm. 

A delay time is so short that it can be ignored in comparison with other steps. The configurations in. Figs. 1 3 and 
1 2 can be applied to NDs and a multiplication circuit, respectively. Pipeline processing can be performed in Fig. 1 3 as 
described in the third embodiment. 

35 The number of the majority logic circuit blocks required by the NDs is expressed by [Log 2 n| ! where n is the number 

•of the inputs to the NDs. As for the number of the inputs to the NDs. a value is applied from 1 to Min(m.n) in an mxn- 
bit multiplier, and apparently its operation time becomes the longest at an ND whose input count is Min(m.n) which is 
the maximum number of the inputs. It is because the number of the majority logic circuit stages is increased for the 
number of the inputs n with [Loc^n]. It is apparent, however, that the number of the stages is not increased significantly 

40 when the number of the bits is increased since it increases with a log function. 

Since the operation is performed in parallel, it terminates with a plurality of NDs 74 at an operation speed of the 
NDs with the maximum number of the inputs Min(m.n). In this configuration. S73(A) is output first. Subsequently, it is 
added to B which terminates the operation earlier, but other outputs are not completed at this time In the same manner 
C is added before D is completely output. The operation can be performed at a high speed due to a parallel processing 

•is like this. 

This ND configuration leads to a higher operation speed due to a parallel operation and to NDs with lower power 
consumption due to less elements required, so that the characteristics of the operation methods in the above embod- 
iments can be significantly improved. 

There is a step for integrating data output from a plurality of NDs into one data sequence on an algorithm, but no 
so processing is performed in circuits as mentioned above, and therefore, there are no circuits corresponding to this step 
in Fig. 12. 

The above multiplier configuration makes it possible to form a high speed multiplier due to a small number of 
elements, low power consumption, and a parallel operation. Additionally, the configuration in Fig. 1 7 can be applied to 
NDs as described in the fourth embodiment. 



55 



[20th embodiment] 

According to this embodiment, a plurality of NDs are integrated out of the 15 NDs in Fig. 31 described in the 1 9th 
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embodiment for an operation. Although the operation method is explained by giving an example of adding two (=x=) 
multiplication results, the invention is not limited to this example. 

Its example is shown ,n Fig. 20 on the basis of a 2x2 majority logic circuit applied to a 2x2 multiplier. In other 
woras it is a majority logic circuit m the first stage in Fig. 13 in the 1 9th embodiment. Data on the first place (x^ in a 
5 place of 2°) is entered into a unit capacitor C. Two pieces of data on the second place (x,y 0 and x 0 y, m a place of 2 ) 
are entered into terminals each having two-fold capacity 2C. and therefore, one input is counted two. Further, data on 
the third place (x,y, in a place of 2 2 ) is weighted 22 and one input is counted four. 

Other description is the same as for the 19th embodiment and up to seven inputs are output in the binary mode 
with four inputs in each ND. By using this westing method, the parallel batch addition function can be used more 

io efficiently. , „„ K .o 

For example in the 5x6 bit multiplier m the 1 9th embodiment, the NDs for adding data on places whose weights 
are 0 1.2 and 3 are integrated into an ND91 as shown in Fig. 36. and the areas on places whose weights are (4 5. 
6) (7 8 9) (10 11. 12. 13. 14) are integrated into each area (ND92 to ND94 in Fig. 21). 

Although the number of the inputs and that of the outputs are indicated by numbers for NDs in Fig. 36. any ND 

is can be used only if it can count up to 56 inputs. For wiring, up to 21-inputs are applied. Respective NDs generate o-bit 
outputs Therefore less NDs are required and the elements can be significantly reduced. In an addition of result Q of 
other (8X8) multiplication. S105 is output, and S105' for S105 can be added while an ND94 calculates S104 It can 
be also applied to other data S35 and S45. and S104 and S104' can be added to their sum subsequently. The same 
calculation can be applied to S103 and S103' and a partial sum S" can be generated by an ND operation and a parallel 

operation Qn|y ^ pjeces Qf da(a p Q gnd R sh0(j|d be added fina|iy Furth ermore. if S"46 from the 

ND92 is added to P separately. P and Q can be integrated into one data sequence, therefore, only two pieces of data 

should be added '"' 

Accordingly a high speed operation is achieved and the elements can be significantly reduced because an adder 
can be used repeatedly. Particularly a use of NDs having the above majority logic circuit is effective due to its clock 
operation This configuration of the multiplication circuit makes it possible to reduce the elements further, to reduce 
power consumption in a small-sized circuit, and to form a higher speed multiplier due to parallel operations and less 

addition stages. . . 

Although contiguous places are integrated in the example of general weighting in this embodiment, the invention 
is not limited to this method, and any of more efficient methods can be flexibly used such as discontinuous weighting 
(for example when data on the 2° place and on the 2* place are integrated to be entered) or dividing data on one place 
to enter it with different weights for different NDs (for example, dividing data on 2* places into two classes and entering 
them into different NDs). 

35 [21 st embodiment] 

In this embodiment, the above operation methods used for a DSP are described by giving an example of a data 
processor having a semiconductor device for performing the operation. 

Although a DSP for a fixed-point operation which is a typical DSP is explained in this embodiment, .t will be un- 
40 derstood that the invention is not limited to (his and it can be applied to other types of DSPs and CPUs. 

The processor in the above embodiments is extremely compatible since it can be formed by using a general sem- 
iconductor MOS transistor. Accordingly, a previous semiconductor device can be used as its substitute with an attach- 
ment of an input-output buffer. 

The configuration of the DSP according to this embodiment is illustrated in Fig. 37. A multiplier and an accumulator 
4S are mounted on the DSP as operation units. The multiplier is used to multiply two pieces of 16-bit data to obtain 31 -bit 
outputs. The accumulator comprises a 1 6-bit arithmetic and logic unit (ALU) and a register for storing output s.gnals 

from the ALU. . , 

There are four types of on-chip memories described below. A data RAM is used to store input signals, whose 

address is specified by an 8-bit data pointer (DP). Lower four bits of the DP are treated by. a 4-bit uptown counter, 
so and upper four bits are treated by a 4-bit register. A data ROM is used for storing weighting factors. Its address, .s 

specified by a ROM pointer (RP) of the 10-bit down counter. A 1 6-bit temporary register (TR) is used for storing data 

temporarily A command ROM is used for storing instructions, and its address is specified by an instructs counter (PC). 
The signal reception or transmission with an external area of the DSP is performed via a 1 -bit serial output register, 

a 1-bit serial input register, and an 8-bit parallel input-output register. A serial output or a senal input is executed in 
55 synchronization with serial input-output clocks (SCK) when an output control signal (SOEN) or an input control signal 

(SIEN) is O V, respectively An 8-bit parallel output is performed after setting a write control signal (WR) or a read 

control signal (RD) to O V when a read/write control signal (CS) is O V. When data for eight bits to be output from an 

SO is stored in the serial output register, an output ready signal (SORQ) is set to 5 V. 
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Respective instructions are read out from the ROM every clock cycle by a specification of the program counter 
(PC) An operation unit or a memory operates according to each decoding result of each read instruction. 

If a reset pulse (RST) is applied, the cornier position of the PC is set to address 0 and the DSP starts its operation 
If an interrupt pulse (INT) is applied subsequently, the PC pointer position is jumped to address 256. Selections of an 
input-output mode (5-bit or 15-bit) and those of whether or not an interrupt should be accepted are determined by a 
1 6-bit status register (SR) in the 3-bit parallel input-output register. A clock driver generates 2-phase clocks TO and 
T2 based on clock pulses (CLK) from an external area and supplies them to operation units or memories. Data is 
transmitted among the input-output register, the operation units, and memories via 1 6-bit buses. 

In this embodiment, the above 16-bit x 1 6-bit high speed multiplier is formed on the same circuit board and in the 
same process as for other logical circuits and memories. 

The actual operation timing of this DSP is described below by giving an example of a 12-stage pipeline product 
and sum operations. Fig. 36 shows 2-phase clock pulses during operation. Reading input signals or weighting factors 
stored in the data ROM or data RAM are read out. in other words, supplying data to the multiplier is performed when 
T 0 of clock cycle m becomes high (5 V) (hereinafter T 0 timing) and a multiplication is executed subsequently. The 
multiplication result is latched to a register at the T 0 timing of the next clock cycle tm+1 ). Simultaneously, a multiplication 
of subsequent data is processed in parallel in the multiplier. 

Generally, an operation speed is decreased with an increase of the number of the bits and particularly a multipli- 
cation speed contributes to it. By using a high speed multiplier according to this embodiment, however, the operation 
speed can be increased, which leads to a remarkable improvement of the DSP performance. In addition, it has a merit ■ 
that a general CMOS process can be formed. Although the operation method is applied to the DSP as a multiplier in 
this embodiment, the invention is not limited to this naturally, and it is apparent that it can be also applied to a variety 
of operation circuits containing an addition or multiplication process for a plurality of multiple bits as examples of other 
applications, taking into consideration its versatility that inputs and outputs or processes are the same as for general 
CMOS processes. This invention has a lot of effects such as reducing chip areas and lower power consumption as 
well as increasing the operation speed. 

[22nd embodiment] 

In this embodiment, the above processor is applied to a correlation operation unit in a reception circuit for a spread 
spectrum communication (SS communication). The configuration of this reception circuit is shown in Fig. 39. As shown 
in Fig. 39. the reception circuit comprises a reception antenna 1 401 . amplifier units 1 402 for amplifying signals, corre- 
lation operation units 1 403A and 1 403B. an A/D conversion unit 1 404. a selector 1 405. and a detector 1 406. 

In the SS communication, signals are converted to multiple bit codes called PN codes and the PN codes are 
transmitted. In the reception circuit, similar PN codes previously stored are compared with the received signals and 
states of the highest correlation are detected to demodulate the transmitted signals. 

Referring to Fig. 39 : a signal received by the antenna 1401 is demodulated primarily by the detector 1406. and 
one is transmitted to the correlation operation unit 1403A and the other is converted to a digital signal in the A/D 
conversion unit and then entered to the correlation operation unit 1403B. The entered signal is compared with a PN 
code previously stored in the reception circuit, the correlation operation unit 1403A generates a synchronizing signal 
from the correlation degree between two signals, and the correlation operation unit 1403B calculates a correlation 
score in synchronization with the synchronizing signal. Then, the signal is demodulated based on the correlation score 
output from the correlation operation unit 1403B by the selector 1405. 

Although the SS communication has excellent features of high performance of call privacy and noise protection 
because signals are converted to multiple bit codes before they are transmitted, it has a problem that tremendous 
loads are imposed on actual signal processing since it requires repetition of additions in an addition circuit as shown 
in Fig. 45 to detect highly correlative states by comparing the received signals with PN codes due to an increase of 
the amount of transmitted information. 

By performing these additions in the processors according to the above embodiment, however, it is possible to 
form an SS communication reception circuit with less elements and low power consumption as well as high operation 
speed. Accordingly it is possible to generate a portable information device for radio communication in the SS commu- 
nication system. 

Furthermore, a compact card-typed reception/transmission unit 2001 as shown in Fig. 40 can be generated since 
the higher operation speed makes it possible to have communication with a large amount of information with less 
elements and low power consumption Therefore, it becomes easier to apply the SS communication to a PCMCIA card 
having an interface of a conventional personal computer. Although a PCMCIA card is used in this example, this invention 
can be easily applied to other interfaces The inputs and outputs are CMOS<ompatible as usual, therefore, downsizing 
and lower power consumption can be easily achieved by the above processor. 

Although the description is made by giving an example of a data processor for the SS communication, the input- 
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cutout interlace is CMOS-compatible as mentioned above, and this invention can be applied to ctner data processors 
,or adding plurality of multiple bit data, particularly including a negative value, for example DSPs or CPUs mentioned 
above or a parallel operation processing una for processing .mages or vo.ces. In addition. ,t can be also used for a 
statistical processing for obta.nmg averages or standard deviation and a numeric operation such as a least squares 



method. 



Furthermore, it is effective to improve various systems such as a wireless LAN. an I/O management system, an 
accounting system, and a video conference system significantly due to the merits of the higher operation speed, down- 
sizing, and lower power consumption. ■ 

According to the embodiment mentioned above, a plurality of multiple bit data can be added at a high speed, 
in addition, a plurality of multiple bit data including a negative value can be added at a high speed. 
Further a plurality of multiple bit data can be multiplied at a high speed. 

Still further, a processor for operating a plurality of multiple bit data at a high speed can be formed in a small chip 
area by using a semiconductor device with low power consumption. 

The processor can be applied to various data processors since it can be generated in a general sem.conductor 
process and therefore, it is possible to generate data processors such as DSPs. CPUs, and reception/transmission 
units for the SS communication as devices with high operation speed and low power consumption having small ch,p 



areas. 



Although the present invention has been described in its preferred form with a certain degree of particularity, many 
apparently widely different embodiments of the invention can be made without departing from the spirit and the scope 
thereof. It is to be understood that the invention is not limited to the specific embodiments hereof except as def.ned in 
the appended claims. 
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Claims 

1 . A processor for adding a plurality of multiple bit data comprising; 

first addition means for adding data together on common places of said plurality of multiple bit data: and 
second addition means for calculating a sum of the addition results obtained by said first addition means. 



2. A processor according to Claim 1 wherein said first addition means adds data on respective places individually 
for said plurality of multiple bit data. 

3. A processor according to Claim 1 . wherein said first addition means adds data on a plurality of places for said 
35 plurality of multiple bit data. «•. 

4 A processor according to Claim 1 . wherein said second addition means integrates a plurality of addition results 
with no common places into single data out of the addition results of said first addition means in an additive oper- 



ation. 



40 



5 A processor according to Claim f . wherein said first addition means enters values on the common places of said 
plurality of multiple bit data in parallel and includes count detection means for detecting the number of inputs hav.ng 
value 1 and outputting the count in binary notation. 

4S $. A processor according to Claim 5. wherein outputs from a plurality of said count detection means are entered into 
at least one other count detection means. 

7. A processor according to Claim 5. wherein said count detection means includes a plurality of majority logic operation 



means. 



so 



8 A processor according to Claim 7. wherein at least one said plurality of majority logic operation means include a 
plurality of input terminals, a plurality of capacitor means connected via said plurality ol input terminals and switch 
means, and a sense amplifier to which said plurality of capacitor means are connected in common 

55 9. A processor according to Claim 8, wherein a capacity of specific capacitor means equals to a capacity of plurality 
ol other capacitor means in said plurality of capacitor means. 

10. A processor according to Claim 8. wherein outputs from said sense amplifier are entered into at least one said 
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plurality of input terminals with a feedback 

11. A processor according to Claim 5. wherein outputs from said sense amplifier are connected to at least one said 
plurality of input terminals via latch means. 

12. A processor according to Claim i. wherein said plurality of multiple bit data include a flag bit representing a sign 
" and said first addition means adds data on respective places of said plurality of multiple bit data including the flag 

bit individually. - 

13. A processor according to Claim 12. wherein said plurality of multiple bit data include a negative value represented 
by a 2's complement. 

14. A processor according to Claim 1 2. wherein said plurality of multiple bit data include a negative value represented 
by a Vs complement. 

15. A processor according to Claim 1. wherein said second addition means integrates data on the nth (n>0) place 
' counted from each place in the addition results of different places generated by said first addition means into single 

data in an additive operation. 

20 16. A processor according to Claim 1. wherein said second addition means integrates data on the nth (n>0) place 
counted from each place in the addition results for each place generated by said first addition means. 

1 7. A processor according to Claim 1 5. wherein said nth place corresponds to the most significant place of the addition 
results of respective places. 

'25 

18. A processor according to Claim 17. wherein said first addition means outputs addition results from the most sig- 
nificant place sequentially. 

19. A processor according to Claim 1. wherein said second addition means executes an addition by using addition 
30 results for partial places already executed by said first addition means in parallel with an addition for other partial 

places executed by said first addition means. 

20. A processor according to Claim 19. wherein said second addition means adds addition results each other executed 
by said first addition means. 

35 

21. A processor according to Claim 19. wherein said second addition means adds addition results executed by said 
first addition means to addition results already executed by said second addition means. 

22. A processor for multiplying a plurality of multiple bit data comprising. 

a partial product generation means for generating partial products of said plurality of multiple bit data: 
a first addition means for integrating data on common places of a plurality of partial products generated by 
said partial product generation means to add data on respective places individually: and 
a second addition means for calculating a sum of the addition results generated by said first addition means. 

23. A processor according to Claim 22. wherein said partial product generation means generates partial products of 
respective bits of first multiple bit data and a specific bit of second multiple bit data simultaneously. 

24. A processor according to Claim 22. wherein said partial product generation means comprises 

input means for entering respective bits of said first multiple bit data in parallel: and 
switch means for setting on or off for inputs from said input means according to a value of each bit of said 
second multiple bit data. 

ss 25. A processor according to Claim 22. wherein said partial product generation means includes a plurality of transistors 
whose gate electrodes are connected in common. 

26. A processor according to Claim 22. wherein said partial product generation means includes a plurality of AND 
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circuits. 



27. 



A orocessor according to Claim 22. wherein sa.d second addition means integrates data on the nth (n>0) place 
counted from each place in the addition results of different places generated by said f.rst addition means into single 
data m an additive operation. 

28 A orocessor according to Claim 22. where.n said second addition means executes an addition by using addition 
' results for partial places already executed by said first addition means in parallel with an addit.on for other partial 
places executed by said first addition means 

29. An operation method of adding a plurality of multiple bit data comprising the steps of: 

a first addition of adding data on common places of said plurality of multiple bit data together: and 
a second addition of calculating a sum of the addition results generated by said first addition step. 

30 An operation method according to Claim 29, wherein a plurality of addition results with no common places, are 
' integrated into single data out of the addition results of said first addition step in said second addition step before 

being added. 

31 An operation method according to Claim 29. where.n sa.d plurality of multiple bit data include a flag b.t representing 
' a sign and. in said first addit.on step, data is added on respective places of sa.d plurality of mult.ple bit data mclud.ng 

the flag bit individually. 

32 An operation method accord.ng to Claim 29. wherein, in said second addition step, is integrated data on the nth 
" (n>0) place counted from each place in the add.t.on results of different places generated by sa.d f.rst add.t.on step 

into single data before being added. , 

33 An operation method according to Claim 29. where.n an add.t.on in sa.d second addition step is executed by using 
* addition results for partial places already executed in sa.d f.rst addition step in parallel with an add.t.on for other 

partial places executed in said first addition step. 

34. An operation method of multiplying a plurality of multiple bit data comprising the steps of: 

partial product generation for generating partial products of said plurality of multiple bit data: 

first addition of adding data on common places together for a plurality of partial products generated in said 

partial product generation step: and 

second addition of calculating a sum of addition results generated in said first addition step. 

35 An operation method according to Claim 34. wherein, in sa.d second addition step, a plurality of addition results 
' with no common places are integrated into single data out of the addition results generated in sa.d f.rst add.t.on 

step before being added. 

36 An operation method accord.ng to Claim 34. where.n. in sa.d second addit.on step, is integrated data on the nth 
' (n>0) place counted from each place in the addit.on results of different places generated by sa.d first add.t.on step 

into single data before being added. 

37 An operation method according to Claim 34. wherein an addition in said second addition step is executed by using 
' addition results for partial places already executed in said f.rst addition step in parallel with an add.t.on for other 

partial places executed in said first addition step. 

38. A data processor comprising: 

input means for entering data: 
storing means for storing data: 

processing means for processing data stored by said storing means and data entered from sa.d input means 
in a given processing procedure: and 

output means for outputting processing results from said processing means: 
wherein said processing means comprises 
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first addition means tor acdmg data on common places each other in a plurality of muitipie bit data at a time anc 
second addition means for calculating a sum of the addition results generated by said first addition means 

39. A data processor according to Claim 35. said processing means further comprising partial product generation 
means for generating partial products of a plurality of multiple bit data, wherein partial products generated by said 
partial product generation means are added by using said first and second addition means. 

40. A data processor according to Claim 38. wherein said input means enters signals, said storing means stores 
weighting factors, and said processing means multiplies the entered signals by respective weighting factors for 

w accumulation. 

41. A data processor according to Claim 38. wherein said input means enters multiple bit signs, said storing means 
previously stores multiple bit signs, and said processing means calculates a correlation amount between the en- 
tered multiple bit signs and the stored multiple bit signs and demodulates multiple bit signs' entered on the basis 

is of the calculated correlation amount. 

42. An adding circuit for multiple multi-bit data wherein (i) for each bit position of the operands a multi-bit count value 
' is formed indicating the number of 1s in that position over all operands, so as to obtain several partial sums over- 
lapping in bit positions, and (ii) the partial sums are added to obtain the desired sum. 

20 

43. An adding circuit according to claim 42 wherein in step (ii) non-overlapping partial sums are concatenated so as 
to produce fewer full-width partial sums to be added. 

44. An adding circuit according to claim 42 or 43 wherein in step (i) blocks of bit positions are handled together (91. 
25 92. 93). 

45. A multiplying circuit wherein partial products are added using an adding circuit according to any of claims 42 to 44. 

46. A multiplying circuit wherein partial products of multi-bit binary numbers are generated using transmission gates 
30 rather than full AND gates. 

47. A data processing device, apparatus or method having the features of any combination of the preceding claims. 
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FIG. 2 
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FIG. 3A 




FIG. 3B 
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FIG. 5 
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FIG. 9 
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FIG. 11 
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FIG. 12 
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FIG. 15 
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FIG. 18 
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